Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC

نویسندگان

  • Kapil Arya
  • Gene Cooperman
  • Andrea Dotti
  • Peter Elmer
چکیده

Process checkpoint-restart is a technology with great potential for use in HEP workflows. Use cases include debugging, reducing the startup time of applications both in offline batch jobs and the High Level Trigger, permitting job preemption in environments where spare CPU cycles are being used opportunistically and efficient scheduling of a mix of multicore and single-threaded jobs. We report on tests of checkpoint-restart technology using CMS software, Geant4-MT (multi-threaded Geant4), and the DMTCP (Distributed Multithreaded Checkpointing) package. We analyze both singleand multi-threaded applications and test on both standard Intel x86 architectures and on Intel MIC. The tests with multi-threaded applications on Intel MIC are used to consider scalability and performance. These are considered an indicator of what the future may hold for many-core computing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi...

متن کامل

Investigation of Portable Event Based Monte Carlo Transport Using the Nvidia Thrust Library

Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. The Trinity machine to become available at Los Alamos National Laboratory in 2016 will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. The Sierra machine to be available at Lawrence Live...

متن کامل

Many-core applications to online track reconstruction in HEP experiments

Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scal...

متن کامل

A Generic Checkpoint-Restart Mechanism for Virtual Machines

It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual ma...

متن کامل

A quantitative Comparison of Checkpoint with Restart and Replication in Volatile Environments

Volatile computing environments such as desktop grids differs from traditional systems in the high volatility of compute nodes in both reachability and availability of compute resource. As a result, different fault tolerant techniques are required to ensure efficient execution of parallel jobs. This technical report summarizes failure and availability patterns of distributed computing systems; ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1311.0272  شماره 

صفحات  -

تاریخ انتشار 2013